{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 导入xlsx文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 基本导入"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "DF1 = pd.read_excel(\"./train.xlsx\")\n",
    "# DF1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 指定要导入的sheet页\n",
    "DF2 = pd.read_excel(\"train.xlsx\", sheet_name = \"train2\")\n",
    "# DF2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 指定行、列索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从第几列开始读取数据(索引从0开始)；\n",
    "index_col表示用xlsx文件中的第几列做行索引，从0开始计数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "DF3 = pd.read_excel(r\"./train.xlsx\", \n",
    "                    sheet_name = \"train2\", \n",
    "                    index_col=2)\n",
    "# DF3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将本地文件导入DataFrame时，默认使用源数据表的第一行作为列索引，  \n",
    "也可以通过设置header参数来设置列索引。  \n",
    "header参数值默认为0，即：用第一行作为列索引。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# 使用第11行作为列索引\n",
    "DF4 = pd.read_excel(r\"./train.xlsx\",\n",
    "                    sheet_name = \"train2\", \n",
    "                    header=10)\n",
    "# DF4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导入指定列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "DF5 = pd.read_excel(r\"train.xlsx\", \n",
    "                    sheet_name = \"train2\", \n",
    "                    usecols=2) #导入0,1,2列\n",
    "# DF5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "也可以以列表的形式传入多个值，表示要导入哪些列。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# 导入索引为2和5的两列，而不是导入[2, 5]区间列\n",
    "DF6 = pd.read_excel(r\"train.xlsx\", \n",
    "                    sheet_name = \"train2\", \n",
    "                    usecols=[2, 5])\n",
    "# DF6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 导入csv文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 直接导入"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "DF1 = pd.read_csv(\"zhihu.csv\",encoding=\"ANSI\")\n",
    "# DF1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**常用编码：** \n",
    "- utf-8  \n",
    "- gbk  \n",
    "- iso-8859-1  \n",
    "- ascii  \n",
    "- ANSI\n",
    "\n",
    "**更改csv文件编码方式：**  \n",
    "使用记事本打开CSV文件，文件-另存为，编码方式选择想要替换成的编码（如gbk）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 指明分隔符号\n",
    "空格、逗号、制表符(\\t)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'DF22' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-8-ae972e0594aa>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      4\u001b[0m \u001b[1;31m# DF22 = pd.read_csv(\"zhihu.csv\", encoding=\"ANSI\", sep=\" \")\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m \u001b[1;31m# 因zhihu.csv不是用空格分开的，所以会报错\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 6\u001b[1;33m \u001b[0mDF22\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m: name 'DF22' is not defined"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# 以空格作为分隔符\n",
    "# DF22 = pd.read_csv(\"zhihu.csv\", encoding=\"ANSI\", sep=\" \")\n",
    "# 因zhihu.csv不是用空格分开的，所以会报错\n",
    "DF22 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 以逗号作为分隔符（csv文件默认使用逗号分隔）\n",
    "# 因zhihu.csv是用空格分开的，所以能正常输出\n",
    "DF22 = pd.read_csv(r\"zhihu.csv\", encoding=\"ANSI\", sep=\",\")\n",
    "# DF22"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 指明读取行数\n",
    "假设现在有一个几百兆的文件，你想了解一下这个文件中有哪些数据，  \n",
    "这时，就没有必要把数据全部导入，而只需查看前面几行即可。  \n",
    "通过设置nrows参数来实现："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>n_folder</th>\n",
       "      <th>n_follow</th>\n",
       "      <th>n_followed</th>\n",
       "      <th>n_question</th>\n",
       "      <th>n_topic</th>\n",
       "      <th>n_zhuanlan</th>\n",
       "      <th>job1</th>\n",
       "      <th>job2</th>\n",
       "      <th>ans_v</th>\n",
       "      <th>share_v</th>\n",
       "      <th>ask_v</th>\n",
       "      <th>store_v</th>\n",
       "      <th>个人简介</th>\n",
       "      <th>居住地</th>\n",
       "      <th>所在行业</th>\n",
       "      <th>收藏夹</th>\n",
       "      <th>教育经历</th>\n",
       "      <th>职业经历</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>587598f89f11daf90617fb7a</td>\n",
       "      <td>52</td>\n",
       "      <td>17</td>\n",
       "      <td>1</td>\n",
       "      <td>30</td>\n",
       "      <td>58</td>\n",
       "      <td>2</td>\n",
       "      <td>交通仓储</td>\n",
       "      <td>邮政</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>邮政</td>\n",
       "      <td>ooxx</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>587598f89f11daf90617fb7c</td>\n",
       "      <td>27</td>\n",
       "      <td>73</td>\n",
       "      <td>15</td>\n",
       "      <td>87</td>\n",
       "      <td>26</td>\n",
       "      <td>1</td>\n",
       "      <td>高新科技</td>\n",
       "      <td>互联网</td>\n",
       "      <td>56</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "      <td>NaN</td>\n",
       "      <td>重庆</td>\n",
       "      <td>互联网</td>\n",
       "      <td>ooxx</td>\n",
       "      <td>重庆邮电大学</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        _id  n_folder  n_follow  n_followed  n_question  \\\n",
       "0  587598f89f11daf90617fb7a        52        17           1          30   \n",
       "1  587598f89f11daf90617fb7c        27        73          15          87   \n",
       "\n",
       "   n_topic  n_zhuanlan  job1 job2  ans_v  share_v  ask_v  store_v  个人简介  居住地  \\\n",
       "0       58           2  交通仓储   邮政      0        0      0        3   NaN  NaN   \n",
       "1       26           1  高新科技  互联网     56        0      4       14   NaN   重庆   \n",
       "\n",
       "  所在行业   收藏夹    教育经历  职业经历  \n",
       "0   邮政  ooxx     NaN   NaN  \n",
       "1  互联网  ooxx  重庆邮电大学   NaN  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DF23 = pd.read_csv(\"zhihu.csv\", encoding=\"ansi\", sep=\",\", nrows=2)\n",
    "DF23"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 指定编码格式 \n",
    "通过设置参数encoding参数来设置导入的编码格式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "DF24 = pd.read_csv(r\"zhihu.csv\", encoding=\"ansi\")\n",
    "# DF24"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 指定engine\n",
    "当文件路径或文件名中有中文时，需要通过设置engine参数来解决导入报错的问题。  \n",
    "错误原因：\n",
    "- 当调用read_csv()方法时，默认使用C语言作为解析语言，只需要把默认值更改为python就可以了。  \n",
    "    1、如果文件格式是CSV UTF-8(逗号分隔)(*.csv),那么编码格式也需要跟着变为utf-8-sig;  \n",
    "    2、如果文件格式是CSV(逗号分隔)(*.csv),对应的编码格式则为gbk。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 不指定engine(默认使用C语言解析)。会报错\n",
    "# DF25 = pd.read_csv(r\"C:\\Users\\sdlr.sdlr-PC\\python3\\数据分析\\dataSourceFiles\\球员批量导入模板20190525.csv\", encoding=\"gbk\")\n",
    "\n",
    "# 指定engine(指定为Python解析)\n",
    "DF5 = pd.read_csv(r\"C:\\Users\\sdlr.sdlr-PC\\python3\\数据分析\\dataSourceFiles\\球员批量导入模板20190525.csv\", engine=\"python\", encoding=\"gbk\")\n",
    "# DF5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 导入txt文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "read_table()是将利用分隔符分开的文件导入DataFrame的通用函数。  \n",
    "它不仅可以导入txt文件，也可以导入csv文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "DF3 = pd.read_table(r\"C:\\Users\\sdlr.sdlr-PC\\python3\\数据分析\\dataSourceFiles\\ST_002210.txt\", encoding=\"gbk\",engine=\"python\")\n",
    "# DF3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用read_table()方法导入csv文件时，即使是逗号分隔的文件，也需要使用sep参数指明分隔符号。  \n",
    "而使用read_csv()方法导入csv文件时，则不需要指明逗号分隔符。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 导入sql文件\n",
    "使用python导入sql文件主要分成两步：  \n",
    "1、将python与数据库进行连接；  \n",
    "2、使用python执行sql语句。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "python使用pymysql模块与数据库连接，需要手动安装。安装方法：  \n",
    "1、查找并进入pip所在目录：  \n",
    "where pip  \n",
    "2、安装pymysql:  \n",
    "pip install pymysql"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sysId</th>\n",
       "      <th>IDNo</th>\n",
       "      <th>agentBeginTime</th>\n",
       "      <th>agentEndTime</th>\n",
       "      <th>assoSysId</th>\n",
       "      <th>associationId</th>\n",
       "      <th>auditStatus</th>\n",
       "      <th>birthPlace</th>\n",
       "      <th>birthday</th>\n",
       "      <th>bloodType</th>\n",
       "      <th>...</th>\n",
       "      <th>teamId</th>\n",
       "      <th>transferSituation</th>\n",
       "      <th>weight</th>\n",
       "      <th>years</th>\n",
       "      <th>playerpositions</th>\n",
       "      <th>clubSysId</th>\n",
       "      <th>ytcSysId</th>\n",
       "      <th>deputyId</th>\n",
       "      <th>footballOrgCode</th>\n",
       "      <th>boneAgeResult</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1009265662469603328</td>\n",
       "      <td>34678909876543</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009261008365027300</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>山东省,青岛市,李沧区</td>\n",
       "      <td>1980-02-07 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>34.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1009267469954580480</td>\n",
       "      <td>39287372838382</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009261008365027300</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>山东省,德州市,庆云县</td>\n",
       "      <td>2018-06-01 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>500.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1009267737471483904</td>\n",
       "      <td>610103198702040416</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009262229498888200</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>陕西省,西安市,碑林区</td>\n",
       "      <td>1987-02-04 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1009267990966829056</td>\n",
       "      <td>234567765432345678765</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009261008365027300</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>山东省,青岛市,李沧区</td>\n",
       "      <td>1999-02-26 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>123.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1009269512643543040</td>\n",
       "      <td>220106198711199011</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009260988244951000</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>吉林省,长春市,宽城区</td>\n",
       "      <td>2018-05-03 00:00:00</td>\n",
       "      <td>AB</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>60.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1009270902338093056</td>\n",
       "      <td>230502198504021358</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009262105175523300</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>黑龙江省,双鸭山市,尖山区</td>\n",
       "      <td>1985-04-02 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>600.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1009272806745047040</td>\n",
       "      <td>6132232222</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009265808943087600</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>河北省,石家庄市,裕华区</td>\n",
       "      <td>1992-07-16 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>72.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1009276079812116480</td>\n",
       "      <td>123456789012345678</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009262105175523300</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>内蒙古自治区,呼和浩特市,新城区</td>\n",
       "      <td>2018-06-01 00:00:00</td>\n",
       "      <td>O</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1009277120326664192</td>\n",
       "      <td>150123199909091234</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009262105175523300</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>内蒙古自治区,呼和浩特市,新城区</td>\n",
       "      <td>1999-09-09 00:00:00</td>\n",
       "      <td>-</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>合格</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1009643157631008768</td>\n",
       "      <td>1234567890123</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009604003606036500</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>广东省,深圳市,福田区</td>\n",
       "      <td>1978-03-27 00:00:00</td>\n",
       "      <td>O</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>70.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1009643421545005056</td>\n",
       "      <td>11111111111111777777111</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009636512146391000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>江西省,景德镇市,昌江区</td>\n",
       "      <td>2018-06-01 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>90.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1009987242078240768</td>\n",
       "      <td>111111111111111111111111111111</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009980810834477000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>安徽省,蚌埠市,淮上区</td>\n",
       "      <td>2018-06-01 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1009987558286819328</td>\n",
       "      <td>210203199010164798</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009981097095725000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>辽宁省,大连市,西岗区</td>\n",
       "      <td>1990-10-16 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>74.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1009989471610863616</td>\n",
       "      <td>21020319850501221X</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009980874990551000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>辽宁省,大连市,西岗区</td>\n",
       "      <td>1985-05-05 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>69.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1009990012088877056</td>\n",
       "      <td>330103199701018888</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009980874990551000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>浙江省,杭州市,余杭区</td>\n",
       "      <td>1997-01-01 00:00:00</td>\n",
       "      <td>AB</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1009990983330299904</td>\n",
       "      <td>210204198703160492</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009984612115415000</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>辽宁省,大连市,沙河口区</td>\n",
       "      <td>1987-08-16 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>76.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1009992440523784192</td>\n",
       "      <td>330681198606104853</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009980874990551000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>浙江省,绍兴市,诸暨市</td>\n",
       "      <td>1986-06-10 00:00:00</td>\n",
       "      <td>O</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>64.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>S20002</td>\n",
       "      <td>通过</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>1009992532274184192</td>\n",
       "      <td>110000198701010001</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009989650657312800</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>广东省,广州市,荔湾区</td>\n",
       "      <td>1987-01-01 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>1009992843109859328</td>\n",
       "      <td>533221199703090330</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009976772244013000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>云南省,丽江市,玉龙纳西族自治县</td>\n",
       "      <td>1997-03-09 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>144.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>1009993119082479616</td>\n",
       "      <td>201212888895658</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009989650657312800</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>广东省,韶关市,浈江区</td>\n",
       "      <td>1988-04-01 00:00:00</td>\n",
       "      <td>O</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>85.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1009993391317975040</td>\n",
       "      <td>ss</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009985882989527000</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>天津市,天津市市辖区,河东区</td>\n",
       "      <td>2018-06-04 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>22.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1009994098582487040</td>\n",
       "      <td>510108199310172412</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009985052848685000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>四川省,成都市,锦江区</td>\n",
       "      <td>1993-10-17 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>74.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1009994476791267328</td>\n",
       "      <td>230103197303300035</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009985882989527000</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>广东省,深圳市,南山区</td>\n",
       "      <td>2014-05-26 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1009995106528264192</td>\n",
       "      <td>12345667890123</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009984490220552200</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>内蒙古自治区,呼和浩特市,新城区</td>\n",
       "      <td>1989-03-03 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>85.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1011170106216022016</td>\n",
       "      <td>310105198608070019</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009991732802093000</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>湖北省,,</td>\n",
       "      <td>2010-01-27 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>75.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>1011789508007428096</td>\n",
       "      <td>130604198602140013</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009265808943087600</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>河北</td>\n",
       "      <td>1986-02-14 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>194.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>1011789508024205312</td>\n",
       "      <td>210203198711229516</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009265808943087600</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>辽宁</td>\n",
       "      <td>1987-11-22 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>181.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>1011789508036788224</td>\n",
       "      <td>530102199304071119</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009265808943087600</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>云南</td>\n",
       "      <td>1993-04-07 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>183.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>1011789508049371136</td>\n",
       "      <td>210204199302036479</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009265808943087600</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>辽宁</td>\n",
       "      <td>1993-02-03 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>190.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>1011789508057759744</td>\n",
       "      <td>130604198607220311</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1009265808943087600</td>\n",
       "      <td>None</td>\n",
       "      <td>4</td>\n",
       "      <td>河北</td>\n",
       "      <td>1986-07-22 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>188.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7548</th>\n",
       "      <td>1032520261129207808</td>\n",
       "      <td>460036200007111220</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325714</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>,,</td>\n",
       "      <td>2000-07-11 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>62.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7549</th>\n",
       "      <td>1032520261309562880</td>\n",
       "      <td>320281199701114524</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325714</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>,,</td>\n",
       "      <td>1997-01-11 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>62.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7550</th>\n",
       "      <td>1032520261481529344</td>\n",
       "      <td>321322200111273826</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325714</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>,,</td>\n",
       "      <td>2001-11-27 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>54.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7551</th>\n",
       "      <td>1032842201316458496</td>\n",
       "      <td>441581200512011753</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,汕尾市,陆丰市</td>\n",
       "      <td>2005-12-01 00:00:00</td>\n",
       "      <td>-</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>60.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>4</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7552</th>\n",
       "      <td>1032842201501007872</td>\n",
       "      <td>440306200504131219</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,深圳市,宝安区</td>\n",
       "      <td>2005-04-13 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>50.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>5</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7553</th>\n",
       "      <td>1032842201681362944</td>\n",
       "      <td>441402200501080410</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,梅州市,梅县区</td>\n",
       "      <td>2005-01-08 00:00:00</td>\n",
       "      <td>AB</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>45.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>5</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7554</th>\n",
       "      <td>1032842201865912320</td>\n",
       "      <td>360124200511163018</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>江西省,南昌市,进贤县</td>\n",
       "      <td>2005-11-16 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>46.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7555</th>\n",
       "      <td>1032842202042073088</td>\n",
       "      <td>360981200602143312</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>江西省,,</td>\n",
       "      <td>2006-02-14 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>38.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>2</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7556</th>\n",
       "      <td>1032842202222428160</td>\n",
       "      <td>511724200510011452</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>四川省,达州市,大竹县</td>\n",
       "      <td>2005-10-01 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>38.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7557</th>\n",
       "      <td>1032842202398588928</td>\n",
       "      <td>440981200410257219</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,茂名市,高州市</td>\n",
       "      <td>2004-10-25 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>43.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7558</th>\n",
       "      <td>1032842202574749696</td>\n",
       "      <td>440306200503061212</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,深圳市,宝安区</td>\n",
       "      <td>2005-03-06 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>74.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7559</th>\n",
       "      <td>1032842202755104768</td>\n",
       "      <td>445222200607141814</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,揭阳市,揭西县</td>\n",
       "      <td>2006-07-14 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>45.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7560</th>\n",
       "      <td>1032842202935459840</td>\n",
       "      <td>360821200603092033</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>江西省,吉安市,吉安县</td>\n",
       "      <td>2006-03-09 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7561</th>\n",
       "      <td>1032842203115814912</td>\n",
       "      <td>440982200508204752</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,茂名市,化州市</td>\n",
       "      <td>2005-08-20 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>45.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7562</th>\n",
       "      <td>1032842203300364288</td>\n",
       "      <td>420683200509054238</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>湖北省,襄阳市,枣阳市</td>\n",
       "      <td>2005-09-05 00:00:00</td>\n",
       "      <td>AB</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>45.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>4</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7563</th>\n",
       "      <td>1032842203476525056</td>\n",
       "      <td>411325200502202911</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>河南省,南阳市,唐河县</td>\n",
       "      <td>2005-02-20 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>48.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7564</th>\n",
       "      <td>1032842203656880128</td>\n",
       "      <td>440306200603071418</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,深圳市,宝安区</td>\n",
       "      <td>2006-03-07 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>60.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>5</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7565</th>\n",
       "      <td>1032842203891761152</td>\n",
       "      <td>44051220050903001X</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,深圳市,宝安区</td>\n",
       "      <td>2005-09-03 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>5</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7566</th>\n",
       "      <td>1032842204072116224</td>\n",
       "      <td>361121200509118914</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>江西省,上饶市,上饶县</td>\n",
       "      <td>2005-09-11 00:00:00</td>\n",
       "      <td>A</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>53.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>4</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7567</th>\n",
       "      <td>1032842204252471296</td>\n",
       "      <td>440982200604194072</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,,</td>\n",
       "      <td>2006-04-19 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7568</th>\n",
       "      <td>1032842204432826368</td>\n",
       "      <td>441481200511254395</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,梅州市,兴宁市</td>\n",
       "      <td>2005-11-25 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7569</th>\n",
       "      <td>1032842204608987136</td>\n",
       "      <td>411381200510231238</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>河南省,南阳市,邓州市</td>\n",
       "      <td>2005-10-23 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>45.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>5</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7570</th>\n",
       "      <td>1032842204793536512</td>\n",
       "      <td>440233200511070016</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,深圳市,宝安区</td>\n",
       "      <td>2005-11-07 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7571</th>\n",
       "      <td>1032842204982280192</td>\n",
       "      <td>440981200502283510</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325720</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>广东省,茂名市,高州市</td>\n",
       "      <td>2005-02-28 00:00:00</td>\n",
       "      <td>未知</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7572</th>\n",
       "      <td>1032879086273298432</td>\n",
       "      <td>EK065832</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325715</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>,,</td>\n",
       "      <td>1992-09-10 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>75.0</td>\n",
       "      <td>2018</td>\n",
       "      <td></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7573</th>\n",
       "      <td>1032896930717368320</td>\n",
       "      <td>510107198901070513</td>\n",
       "      <td>2018-05-01</td>\n",
       "      <td>2020-04-30</td>\n",
       "      <td>1013250085351325709</td>\n",
       "      <td>None</td>\n",
       "      <td>2</td>\n",
       "      <td>四川省,成都市,武侯区</td>\n",
       "      <td>1989-01-07 00:00:00</td>\n",
       "      <td>B</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>75.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>2</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7574</th>\n",
       "      <td>1033912519070056448</td>\n",
       "      <td>310106199706161613</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325727</td>\n",
       "      <td>None</td>\n",
       "      <td>0</td>\n",
       "      <td>,,</td>\n",
       "      <td>1997-06-16 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.0</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7575</th>\n",
       "      <td>1034270637624918016</td>\n",
       "      <td>320921200602030613</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325727</td>\n",
       "      <td>None</td>\n",
       "      <td>0</td>\n",
       "      <td>江苏省,盐城市,响水县</td>\n",
       "      <td>2006-02-03 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>65.0</td>\n",
       "      <td>2018</td>\n",
       "      <td></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7576</th>\n",
       "      <td>1034272337597300736</td>\n",
       "      <td>654121199911300673</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325732</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>,,</td>\n",
       "      <td>1999-11-03 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>72.0</td>\n",
       "      <td>2018</td>\n",
       "      <td></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7577</th>\n",
       "      <td>1034328091259830272</td>\n",
       "      <td>f</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>1013250085351325733</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>湖北省,潜江市,</td>\n",
       "      <td>1999-02-03 00:00:00</td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2018</td>\n",
       "      <td></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>7578 rows × 60 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                    sysId                            IDNo agentBeginTime  \\\n",
       "0     1009265662469603328                  34678909876543            NaT   \n",
       "1     1009267469954580480                  39287372838382            NaT   \n",
       "2     1009267737471483904              610103198702040416            NaT   \n",
       "3     1009267990966829056           234567765432345678765            NaT   \n",
       "4     1009269512643543040              220106198711199011            NaT   \n",
       "5     1009270902338093056              230502198504021358            NaT   \n",
       "6     1009272806745047040                      6132232222            NaT   \n",
       "7     1009276079812116480              123456789012345678            NaT   \n",
       "8     1009277120326664192              150123199909091234            NaT   \n",
       "9     1009643157631008768                   1234567890123            NaT   \n",
       "10    1009643421545005056         11111111111111777777111            NaT   \n",
       "11    1009987242078240768  111111111111111111111111111111            NaT   \n",
       "12    1009987558286819328              210203199010164798            NaT   \n",
       "13    1009989471610863616              21020319850501221X            NaT   \n",
       "14    1009990012088877056              330103199701018888            NaT   \n",
       "15    1009990983330299904              210204198703160492            NaT   \n",
       "16    1009992440523784192              330681198606104853            NaT   \n",
       "17    1009992532274184192              110000198701010001            NaT   \n",
       "18    1009992843109859328              533221199703090330            NaT   \n",
       "19    1009993119082479616                 201212888895658            NaT   \n",
       "20    1009993391317975040                              ss            NaT   \n",
       "21    1009994098582487040              510108199310172412            NaT   \n",
       "22    1009994476791267328              230103197303300035            NaT   \n",
       "23    1009995106528264192                  12345667890123            NaT   \n",
       "24    1011170106216022016              310105198608070019            NaT   \n",
       "25    1011789508007428096              130604198602140013            NaT   \n",
       "26    1011789508024205312              210203198711229516            NaT   \n",
       "27    1011789508036788224              530102199304071119            NaT   \n",
       "28    1011789508049371136              210204199302036479            NaT   \n",
       "29    1011789508057759744             130604198607220311             NaT   \n",
       "...                   ...                             ...            ...   \n",
       "7548  1032520261129207808              460036200007111220            NaT   \n",
       "7549  1032520261309562880              320281199701114524            NaT   \n",
       "7550  1032520261481529344              321322200111273826            NaT   \n",
       "7551  1032842201316458496              441581200512011753            NaT   \n",
       "7552  1032842201501007872              440306200504131219            NaT   \n",
       "7553  1032842201681362944              441402200501080410            NaT   \n",
       "7554  1032842201865912320              360124200511163018            NaT   \n",
       "7555  1032842202042073088              360981200602143312            NaT   \n",
       "7556  1032842202222428160              511724200510011452            NaT   \n",
       "7557  1032842202398588928              440981200410257219            NaT   \n",
       "7558  1032842202574749696              440306200503061212            NaT   \n",
       "7559  1032842202755104768              445222200607141814            NaT   \n",
       "7560  1032842202935459840              360821200603092033            NaT   \n",
       "7561  1032842203115814912              440982200508204752            NaT   \n",
       "7562  1032842203300364288              420683200509054238            NaT   \n",
       "7563  1032842203476525056              411325200502202911            NaT   \n",
       "7564  1032842203656880128              440306200603071418            NaT   \n",
       "7565  1032842203891761152              44051220050903001X            NaT   \n",
       "7566  1032842204072116224              361121200509118914            NaT   \n",
       "7567  1032842204252471296              440982200604194072            NaT   \n",
       "7568  1032842204432826368              441481200511254395            NaT   \n",
       "7569  1032842204608987136              411381200510231238            NaT   \n",
       "7570  1032842204793536512              440233200511070016            NaT   \n",
       "7571  1032842204982280192              440981200502283510            NaT   \n",
       "7572  1032879086273298432                        EK065832            NaT   \n",
       "7573  1032896930717368320              510107198901070513     2018-05-01   \n",
       "7574  1033912519070056448              310106199706161613            NaT   \n",
       "7575  1034270637624918016              320921200602030613            NaT   \n",
       "7576  1034272337597300736              654121199911300673            NaT   \n",
       "7577  1034328091259830272                               f            NaT   \n",
       "\n",
       "     agentEndTime            assoSysId associationId  auditStatus  \\\n",
       "0             NaT  1009261008365027300          None            2   \n",
       "1             NaT  1009261008365027300          None            2   \n",
       "2             NaT  1009262229498888200          None            2   \n",
       "3             NaT  1009261008365027300          None            2   \n",
       "4             NaT  1009260988244951000          None            1   \n",
       "5             NaT  1009262105175523300          None            2   \n",
       "6             NaT  1009265808943087600          None            1   \n",
       "7             NaT  1009262105175523300          None            4   \n",
       "8             NaT  1009262105175523300          None            4   \n",
       "9             NaT  1009604003606036500          None            2   \n",
       "10            NaT  1009636512146391000          None            4   \n",
       "11            NaT  1009980810834477000          None            4   \n",
       "12            NaT  1009981097095725000          None            4   \n",
       "13            NaT  1009980874990551000          None            4   \n",
       "14            NaT  1009980874990551000          None            4   \n",
       "15            NaT  1009984612115415000          None            2   \n",
       "16            NaT  1009980874990551000          None            4   \n",
       "17            NaT  1009989650657312800          None            4   \n",
       "18            NaT  1009976772244013000          None            4   \n",
       "19            NaT  1009989650657312800          None            4   \n",
       "20            NaT  1009985882989527000          None            1   \n",
       "21            NaT  1009985052848685000          None            4   \n",
       "22            NaT  1009985882989527000          None            4   \n",
       "23            NaT  1009984490220552200          None            1   \n",
       "24            NaT  1009991732802093000          None            1   \n",
       "25            NaT  1009265808943087600          None            4   \n",
       "26            NaT  1009265808943087600          None            4   \n",
       "27            NaT  1009265808943087600          None            4   \n",
       "28            NaT  1009265808943087600          None            4   \n",
       "29            NaT  1009265808943087600          None            4   \n",
       "...           ...                  ...           ...          ...   \n",
       "7548          NaT  1013250085351325714          None            2   \n",
       "7549          NaT  1013250085351325714          None            2   \n",
       "7550          NaT  1013250085351325714          None            2   \n",
       "7551          NaT  1013250085351325720          None            1   \n",
       "7552          NaT  1013250085351325720          None            1   \n",
       "7553          NaT  1013250085351325720          None            1   \n",
       "7554          NaT  1013250085351325720          None            1   \n",
       "7555          NaT  1013250085351325720          None            1   \n",
       "7556          NaT  1013250085351325720          None            1   \n",
       "7557          NaT  1013250085351325720          None            1   \n",
       "7558          NaT  1013250085351325720          None            1   \n",
       "7559          NaT  1013250085351325720          None            1   \n",
       "7560          NaT  1013250085351325720          None            1   \n",
       "7561          NaT  1013250085351325720          None            1   \n",
       "7562          NaT  1013250085351325720          None            1   \n",
       "7563          NaT  1013250085351325720          None            1   \n",
       "7564          NaT  1013250085351325720          None            1   \n",
       "7565          NaT  1013250085351325720          None            1   \n",
       "7566          NaT  1013250085351325720          None            1   \n",
       "7567          NaT  1013250085351325720          None            1   \n",
       "7568          NaT  1013250085351325720          None            1   \n",
       "7569          NaT  1013250085351325720          None            1   \n",
       "7570          NaT  1013250085351325720          None            1   \n",
       "7571          NaT  1013250085351325720          None            1   \n",
       "7572          NaT  1013250085351325715          None            2   \n",
       "7573   2020-04-30  1013250085351325709          None            2   \n",
       "7574          NaT  1013250085351325727          None            0   \n",
       "7575          NaT  1013250085351325727          None            0   \n",
       "7576          NaT  1013250085351325732          None            1   \n",
       "7577          NaT  1013250085351325733          None            1   \n",
       "\n",
       "            birthPlace             birthday bloodType      ...       teamId  \\\n",
       "0          山东省,青岛市,李沧区  1980-02-07 00:00:00                ...         None   \n",
       "1          山东省,德州市,庆云县  2018-06-01 00:00:00         B      ...         None   \n",
       "2          陕西省,西安市,碑林区  1987-02-04 00:00:00         B      ...         None   \n",
       "3          山东省,青岛市,李沧区  1999-02-26 00:00:00                ...         None   \n",
       "4          吉林省,长春市,宽城区  2018-05-03 00:00:00        AB      ...         None   \n",
       "5        黑龙江省,双鸭山市,尖山区  1985-04-02 00:00:00         B      ...         None   \n",
       "6         河北省,石家庄市,裕华区  1992-07-16 00:00:00         A      ...         None   \n",
       "7     内蒙古自治区,呼和浩特市,新城区  2018-06-01 00:00:00         O      ...         None   \n",
       "8     内蒙古自治区,呼和浩特市,新城区  1999-09-09 00:00:00         -      ...         None   \n",
       "9          广东省,深圳市,福田区  1978-03-27 00:00:00         O      ...         None   \n",
       "10        江西省,景德镇市,昌江区  2018-06-01 00:00:00                ...         None   \n",
       "11         安徽省,蚌埠市,淮上区  2018-06-01 00:00:00         A      ...         None   \n",
       "12         辽宁省,大连市,西岗区  1990-10-16 00:00:00         A      ...         None   \n",
       "13         辽宁省,大连市,西岗区  1985-05-05 00:00:00         A      ...         None   \n",
       "14         浙江省,杭州市,余杭区  1997-01-01 00:00:00        AB      ...         None   \n",
       "15        辽宁省,大连市,沙河口区  1987-08-16 00:00:00                ...         None   \n",
       "16         浙江省,绍兴市,诸暨市  1986-06-10 00:00:00         O      ...         None   \n",
       "17         广东省,广州市,荔湾区  1987-01-01 00:00:00         A      ...         None   \n",
       "18    云南省,丽江市,玉龙纳西族自治县  1997-03-09 00:00:00         B      ...         None   \n",
       "19         广东省,韶关市,浈江区  1988-04-01 00:00:00         O      ...         None   \n",
       "20      天津市,天津市市辖区,河东区  2018-06-04 00:00:00         A      ...         None   \n",
       "21         四川省,成都市,锦江区  1993-10-17 00:00:00         A      ...         None   \n",
       "22         广东省,深圳市,南山区  2014-05-26 00:00:00         B      ...         None   \n",
       "23    内蒙古自治区,呼和浩特市,新城区  1989-03-03 00:00:00                ...         None   \n",
       "24               湖北省,,  2010-01-27 00:00:00                ...         None   \n",
       "25                  河北  1986-02-14 00:00:00      None      ...         None   \n",
       "26                  辽宁  1987-11-22 00:00:00      None      ...         None   \n",
       "27                  云南  1993-04-07 00:00:00      None      ...         None   \n",
       "28                  辽宁  1993-02-03 00:00:00      None      ...         None   \n",
       "29                  河北  1986-07-22 00:00:00      None      ...         None   \n",
       "...                ...                  ...       ...      ...          ...   \n",
       "7548                ,,  2000-07-11 00:00:00         B      ...         None   \n",
       "7549                ,,  1997-01-11 00:00:00         A      ...         None   \n",
       "7550                ,,  2001-11-27 00:00:00         A      ...         None   \n",
       "7551       广东省,汕尾市,陆丰市  2005-12-01 00:00:00         -      ...         None   \n",
       "7552       广东省,深圳市,宝安区  2005-04-13 00:00:00         A      ...         None   \n",
       "7553       广东省,梅州市,梅县区  2005-01-08 00:00:00        AB      ...         None   \n",
       "7554       江西省,南昌市,进贤县  2005-11-16 00:00:00        未知      ...         None   \n",
       "7555             江西省,,  2006-02-14 00:00:00        未知      ...         None   \n",
       "7556       四川省,达州市,大竹县  2005-10-01 00:00:00        未知      ...         None   \n",
       "7557       广东省,茂名市,高州市  2004-10-25 00:00:00        未知      ...         None   \n",
       "7558       广东省,深圳市,宝安区  2005-03-06 00:00:00        未知      ...         None   \n",
       "7559       广东省,揭阳市,揭西县  2006-07-14 00:00:00        未知      ...         None   \n",
       "7560       江西省,吉安市,吉安县  2006-03-09 00:00:00         B      ...         None   \n",
       "7561       广东省,茂名市,化州市  2005-08-20 00:00:00        未知      ...         None   \n",
       "7562       湖北省,襄阳市,枣阳市  2005-09-05 00:00:00        AB      ...         None   \n",
       "7563       河南省,南阳市,唐河县  2005-02-20 00:00:00        未知      ...         None   \n",
       "7564       广东省,深圳市,宝安区  2006-03-07 00:00:00        未知      ...         None   \n",
       "7565       广东省,深圳市,宝安区  2005-09-03 00:00:00        未知      ...         None   \n",
       "7566       江西省,上饶市,上饶县  2005-09-11 00:00:00         A      ...         None   \n",
       "7567             广东省,,  2006-04-19 00:00:00        未知      ...         None   \n",
       "7568       广东省,梅州市,兴宁市  2005-11-25 00:00:00        未知      ...         None   \n",
       "7569       河南省,南阳市,邓州市  2005-10-23 00:00:00        未知      ...         None   \n",
       "7570       广东省,深圳市,宝安区  2005-11-07 00:00:00         B      ...         None   \n",
       "7571       广东省,茂名市,高州市  2005-02-28 00:00:00        未知      ...         None   \n",
       "7572                ,,  1992-09-10 00:00:00                ...         None   \n",
       "7573       四川省,成都市,武侯区  1989-01-07 00:00:00         B      ...         None   \n",
       "7574                ,,  1997-06-16 00:00:00                ...         None   \n",
       "7575       江苏省,盐城市,响水县  2006-02-03 00:00:00                ...         None   \n",
       "7576                ,,  1999-11-03 00:00:00                ...         None   \n",
       "7577          湖北省,潜江市,  1999-02-03 00:00:00                ...         None   \n",
       "\n",
       "     transferSituation weight years  playerpositions clubSysId ytcSysId  \\\n",
       "0                  NaN   34.0  2018             None      None     None   \n",
       "1                  NaN  500.0  2018             None      None     None   \n",
       "2                  NaN   80.0  2018             None      None     None   \n",
       "3                  NaN  123.0  2018             None      None     None   \n",
       "4                  NaN   60.0  2018             None      None     None   \n",
       "5                  NaN  600.0  2018             None      None     None   \n",
       "6                  NaN   72.0  2018             None      None     None   \n",
       "7                  NaN   80.0  2018             None      None     None   \n",
       "8                  NaN   80.0  2018             None      None     None   \n",
       "9                  NaN   70.0  2018             None      None     None   \n",
       "10                 NaN   90.0  2018             None      None     None   \n",
       "11                 NaN   80.0  2018             None      None     None   \n",
       "12                 NaN   74.0  2018             None      None     None   \n",
       "13                 NaN   69.0  2018             None      None     None   \n",
       "14                 NaN   80.0  2018             None      None     None   \n",
       "15                 NaN   76.0  2018             None      None     None   \n",
       "16                 NaN   64.0  2018             None      None     None   \n",
       "17                 NaN  100.0  2018             None      None     None   \n",
       "18                 NaN  144.0  2018             None      None     None   \n",
       "19                 NaN   85.0  2018             None      None     None   \n",
       "20                 NaN   22.0  2018             None      None     None   \n",
       "21                 NaN   74.0  2018             None      None     None   \n",
       "22                 NaN   80.0  2018             None      None     None   \n",
       "23                 NaN   85.0  2018             None      None     None   \n",
       "24                 NaN   75.0  2018             None      None     None   \n",
       "25                 NaN  194.0  2018             None      None     None   \n",
       "26                 NaN  181.0  2018             None      None     None   \n",
       "27                 NaN  183.0  2018             None      None     None   \n",
       "28                 NaN  190.0  2018             None      None     None   \n",
       "29                 NaN  188.0  2018             None      None     None   \n",
       "...                ...    ...   ...              ...       ...      ...   \n",
       "7548               NaN   62.0  2018             None      None     None   \n",
       "7549               NaN   62.0  2018             None      None     None   \n",
       "7550               NaN   54.0  2018             None      None     None   \n",
       "7551               NaN   60.0  2018                4      None     None   \n",
       "7552               NaN   50.0  2018                5      None     None   \n",
       "7553               NaN   45.0  2018                5      None     None   \n",
       "7554               NaN   46.0  2018                3      None     None   \n",
       "7555               NaN   38.0  2018                2      None     None   \n",
       "7556               NaN   38.0  2018                1      None     None   \n",
       "7557               NaN   43.0  2018                3      None     None   \n",
       "7558               NaN   74.0  2018                1      None     None   \n",
       "7559               NaN   45.0  2018                3      None     None   \n",
       "7560               NaN   40.0  2018                3      None     None   \n",
       "7561               NaN   45.0  2018                1      None     None   \n",
       "7562               NaN   45.0  2018                4      None     None   \n",
       "7563               NaN   48.0  2018                3      None     None   \n",
       "7564               NaN   60.0  2018                5      None     None   \n",
       "7565               NaN   47.0  2018                5      None     None   \n",
       "7566               NaN   53.0  2018                4      None     None   \n",
       "7567               NaN   40.0  2018                3      None     None   \n",
       "7568               NaN   40.0  2018                1      None     None   \n",
       "7569               NaN   45.0  2018                5      None     None   \n",
       "7570               NaN   40.0  2018                3      None     None   \n",
       "7571               NaN   40.0  2018                1      None     None   \n",
       "7572               NaN   75.0  2018                       None     None   \n",
       "7573               NaN   75.0  2018                2      None     None   \n",
       "7574               NaN   80.0  2018                1      None     None   \n",
       "7575               NaN   65.0  2018                       None     None   \n",
       "7576               NaN   72.0  2018                       None     None   \n",
       "7577               NaN    2.0  2018                       None     None   \n",
       "\n",
       "      deputyId footballOrgCode boneAgeResult  \n",
       "0         None            None                \n",
       "1         None            None                \n",
       "2         None            None                \n",
       "3         None            None                \n",
       "4         None            None                \n",
       "5         None            None                \n",
       "6         None            None            00  \n",
       "7         None            None            23  \n",
       "8         None            None            合格  \n",
       "9         None            None                \n",
       "10        None            None                \n",
       "11        None            None                \n",
       "12        None            None                \n",
       "13        None            None                \n",
       "14        None            None                \n",
       "15        None            None                \n",
       "16        None          S20002            通过  \n",
       "17        None            None                \n",
       "18        None            None            18  \n",
       "19        None            None                \n",
       "20        None            None                \n",
       "21        None            None                \n",
       "22        None            None                \n",
       "23        None            None                \n",
       "24        None            None                \n",
       "25        None            None          None  \n",
       "26        None            None          None  \n",
       "27        None            None          None  \n",
       "28        None            None          None  \n",
       "29        None            None          None  \n",
       "...        ...             ...           ...  \n",
       "7548      None            None          None  \n",
       "7549      None            None          None  \n",
       "7550      None            None          None  \n",
       "7551      None            None          None  \n",
       "7552      None            None          None  \n",
       "7553      None            None          None  \n",
       "7554      None            None          None  \n",
       "7555      None            None          None  \n",
       "7556      None            None          None  \n",
       "7557      None            None          None  \n",
       "7558      None            None          None  \n",
       "7559      None            None          None  \n",
       "7560      None            None          None  \n",
       "7561      None            None          None  \n",
       "7562      None            None          None  \n",
       "7563      None            None          None  \n",
       "7564      None            None          None  \n",
       "7565      None            None          None  \n",
       "7566      None            None          None  \n",
       "7567      None            None          None  \n",
       "7568      None            None          None  \n",
       "7569      None            None          None  \n",
       "7570      None            None          None  \n",
       "7571      None            None          None  \n",
       "7572      None            None                \n",
       "7573      None            None                \n",
       "7574      None            None                \n",
       "7575      None            None                \n",
       "7576      None            None                \n",
       "7577      None            None            12  \n",
       "\n",
       "[7578 rows x 60 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pymysql\n",
    "import pandas as pd\n",
    "\n",
    "# 1、将python与数据库进行连接；\n",
    "engine = pymysql.connect(host='172.16.9.102',\n",
    "                         port=3306,\n",
    "                         user='root',\n",
    "                         password='zuxie@123',\n",
    "                         db='uip',             # 数据库名\n",
    "                         charset='utf8'\n",
    "                         )\n",
    "sql = 'select * from t_uip_p_player;'\n",
    "\n",
    "# 2、执行sql语句\n",
    "DF1 = pd.read_sql(sql, engine)\n",
    "DF1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 熟悉数据\n",
    "## 预览前几行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# 基本导入\n",
    "DF1 = pd.read_csv(\"train.csv\",encoding=\"ansi\")\n",
    "DF1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取数据表的大小\n",
    "shape()方法会以元组的形式返回数据表的行、列数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(891, 12)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DF1.shape  # (37, 16) 37行16列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取数据类型\n",
    "熟悉数据的第二点就是看一下数据类型，不同数据类型的分析思路是不一样的，例如：  \n",
    "1、数值类型的数据可以直接求平均值，  \n",
    "2、字符串类型的数据就没法求平均值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 12 columns):\n",
      "PassengerId    891 non-null int64\n",
      "Survived       891 non-null int64\n",
      "Pclass         891 non-null int64\n",
      "Name           891 non-null object\n",
      "Sex            891 non-null object\n",
      "Age            714 non-null float64\n",
      "SibSp          891 non-null int64\n",
      "Parch          891 non-null int64\n",
      "Ticket         891 non-null object\n",
      "Fare           891 non-null float64\n",
      "Cabin          204 non-null object\n",
      "Embarked       889 non-null object\n",
      "dtypes: float64(2), int64(5), object(5)\n",
      "memory usage: 83.6+ KB\n"
     ]
    }
   ],
   "source": [
    "# 使用Python调用info()方法后，就会输出整个表中所有列的数据类型。\n",
    "DF1.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取数值分布情况\n",
    "熟悉数据的第三点就是要掌握数据的分布情况，即：  \n",
    "均值多少？最值多少？方差及分位数又是多少？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>714.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>446.000000</td>\n",
       "      <td>0.383838</td>\n",
       "      <td>2.308642</td>\n",
       "      <td>29.699118</td>\n",
       "      <td>0.523008</td>\n",
       "      <td>0.381594</td>\n",
       "      <td>32.204208</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>257.353842</td>\n",
       "      <td>0.486592</td>\n",
       "      <td>0.836071</td>\n",
       "      <td>14.526497</td>\n",
       "      <td>1.102743</td>\n",
       "      <td>0.806057</td>\n",
       "      <td>49.693429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.420000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>223.500000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>20.125000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>7.910400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>446.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>28.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>14.454200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>668.500000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>38.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>31.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>891.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>80.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>512.329200</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       PassengerId    Survived      Pclass         Age       SibSp  \\\n",
       "count   891.000000  891.000000  891.000000  714.000000  891.000000   \n",
       "mean    446.000000    0.383838    2.308642   29.699118    0.523008   \n",
       "std     257.353842    0.486592    0.836071   14.526497    1.102743   \n",
       "min       1.000000    0.000000    1.000000    0.420000    0.000000   \n",
       "25%     223.500000    0.000000    2.000000   20.125000    0.000000   \n",
       "50%     446.000000    0.000000    3.000000   28.000000    0.000000   \n",
       "75%     668.500000    1.000000    3.000000   38.000000    1.000000   \n",
       "max     891.000000    1.000000    3.000000   80.000000    8.000000   \n",
       "\n",
       "            Parch        Fare  \n",
       "count  891.000000  891.000000  \n",
       "mean     0.381594   32.204208  \n",
       "std      0.806057   49.693429  \n",
       "min      0.000000    0.000000  \n",
       "25%      0.000000    7.910400  \n",
       "50%      0.000000   14.454200  \n",
       "75%      0.000000   31.000000  \n",
       "max      6.000000  512.329200  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 在python中只需使用describe()方法就可以获取所有数值类型字段的分布值。\n",
    "DF1.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为表DF1中只有年龄这一列是数值类型，  \n",
    "所以调用describe()方法时，只计算了年龄这一列的相关数值分布情况。  \n",
    "新建一个含有多列数值类型字段的DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   年龄    收入  家属数\n",
      "0  20  5000    2\n",
      "1  25  8000    3\n",
      "2  30  9000    3\n",
      "3  28  7000    2\n",
      "              年龄           收入      家属数\n",
      "count   4.000000     4.000000  4.00000\n",
      "mean   25.750000  7250.000000  2.50000\n",
      "std     4.349329  1707.825128  0.57735\n",
      "min    20.000000  5000.000000  2.00000\n",
      "25%    23.750000  6500.000000  2.00000\n",
      "50%    26.500000  7500.000000  2.50000\n",
      "75%    28.500000  8250.000000  3.00000\n",
      "max    30.000000  9000.000000  3.00000\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "DF = pd.DataFrame([[20,5000,2],[25,8000,3],[30,9000,3],[28,7000,2]],\n",
    "            columns = [\"年龄\",\"收入\",\"家属数\"])\n",
    "print(DF)\n",
    "print(DF.describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上表中的年龄、收入、家属数都是数值类型，  \n",
    "所以在调用describe()方法的时候，会同时计算这三列的数值分布情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
